-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Optimize vector::assign for InputIterator-only pair inputs #113852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize vector::assign for InputIterator-only pair inputs #113852
Conversation
|
@llvm/pr-subscribers-libcxx Author: Peng Liu (winner245) ChangesSummaryThis PR optimizes the Details
TestingBenchmark tests (Quick-Bench Results) show significant performance improvements for test cases with pre-populated elements where the vector sizes are about the same before and after assignment:
where Full diff: https://github.com/llvm/llvm-project/pull/113852.diff 1 Files Affected:
diff --git a/libcxx/include/__vector/vector.h b/libcxx/include/__vector/vector.h
index 7889e8c2201ac1..6c37c3113a536a 100644
--- a/libcxx/include/__vector/vector.h
+++ b/libcxx/include/__vector/vector.h
@@ -1031,9 +1031,14 @@ template <class _Tp, class _Allocator>
template <class _Iterator, class _Sentinel>
_LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI void
vector<_Tp, _Allocator>::__assign_with_sentinel(_Iterator __first, _Sentinel __last) {
- clear();
- for (; __first != __last; ++__first)
- emplace_back(*__first);
+ pointer __cur = __begin_;
+ for (; __first != __last && __cur != __end_; ++__cur, ++__first)
+ *__cur = *__first;
+ if (__cur != __end_)
+ __destruct_at_end(__cur);
+ else
+ for (; __first != __last; ++__first)
+ emplace_back(*__first);
}
template <class _Tp, class _Allocator>
|
9f5bbf5 to
83d79b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments. This is great!
| c1 = c2; | ||
| DoNotOptimizeData(c1); | ||
| DoNotOptimizeData(c2); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not attached to this line: Can you please add a release note to 20.rst mentioning this optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your positive feedback and recognition of my work! I appreciate your time and effort in reviewing this PR. I have added a description of this performance optimization to the release notes and rebased the PR onto the main branch. Thanks again for your help and support!
83d79b3 to
03b8721
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It would be great if @philnik777 or @frederick-vs-ja could also have a look to make sure I didn't miss something related to conformance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation itself LGTM, but I think we want to rework the benchmarks. I'd also like to see the actual benchmark results and would rather have some text in the commit message than graphics, since I don't think the graphics will show anywhere except on GitHub.
c5b2e3f to
87f94b9
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
std::vector::assign(first, last)
549ba00 to
22e78f4
Compare
a2e4ff3 to
6112450
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes still LGTM, I have a few comments on the benchmarks and most importantly I'd like @philnik777 to chime in to say whether he's satisfied with the benchmarks, since he had requested some changes.
| } | ||
|
|
||
| template <class IntT> | ||
| inline std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| inline std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len | |
| std::vector<std::vector<IntT>> getRandomIntegerInputsWithLength(std::size_t N, std::size_t len) { // N-by-len |
inline not needed since this is a template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
std::vector::assign(first, last)std::vector::assign for InputIterator-pair inputs
std::vector::assign for InputIterator-pair inputs6112450 to
4fa53ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, but please wait for @philnik777 to stamp this since he had comments.
|
@philnik777 Thank you for the approval! I appreciate your time. |
As a follow-up to #113852, this PR optimizes the performance of the `insert(const_iterator pos, InputIt first, InputIt last)` function for `input_iterator`-pair inputs in `std::vector` for cases where reallocation occurs during insertion. Additionally, this optimization enhances exception safety by replacing the traditional `try-catch` mechanism with a modern exception guard for the `insert` function. The optimization targets cases where insertion trigger reallocation. In scenarios without reallocation, the implementation remains unchanged. Previous implementation ----------------------- The previous implementation of `insert` is inefficient in reallocation scenarios because it performs the following steps separately: - `reserve()`: This leads to the first round of relocating old elements to new memory; - `rotate()`: This leads to the second round of reorganizing the existing elements; - Move-forward: Moves the elements after the insertion position to their final positions. - Insert: performs the actual insertion. This approach results in a lot of redundant operations, requiring the elements to undergo three rounds of relocations/reorganizations to be placed in their final positions. Proposed implementation ----------------------- The proposed implementation jointly optimize the above 4 steps in the previous implementation such that each element is placed in its final position in just one round of relocation. Specifically, this optimization reduces the total cost from 2 relocations + 1 std::rotate call to just 1 relocation, without needing to call `std::rotate`, thereby significantly improving overall performance.
This PR optimizes the input iterator overload of
assign(_InputIterator, _InputIterator)instd::vector<_Tp, _Allocator>by directly assigning to already initialized memory, rather than first destroying existing elements and then constructing new ones. By eliminating unnecessary destruction and construction, the proposed algorithm enhances the performance by up to 2x for trivial element types (e.g.,std::vector<int>), up to 2.6x for non-trivial element types likestd::vector<std::string>, and up to 3.4x for more complex non-trivial types (e.g.,std::vector<std::vector<int>>).Google Benchmarks
Benchmark tests (
libcxx/test/benchmarks/vector_operations.bench.cpp) were conducted for theassign()implementations before and after this patch. The tests focused on trivial element types likestd::vector<int>, and non-trivial element types such asstd::vector<std::string>andstd::vector<std::vector<int>>.Before
After